- 
                Notifications
    
You must be signed in to change notification settings  - Fork 2.8k
 
[NPUW] Phi-3 2K accuracy fix #32426
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NPUW] Phi-3 2K accuracy fix #32426
Conversation
2a1bacc    to
    62a8876      
    Compare
  
    There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please try to minimize the diff
| public: | ||
| OPENVINO_MATCHER_PASS_RTTI("npuw::LLMCompiledModel::Phi3SlidingMask"); | ||
| 
               | 
          ||
| explicit Phi3SlidingMask() { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems in this particular case explicit is not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
| // Searching for Phi3 sliding mask pattern to extend it to work with right-padded | ||
| // past tokens and left-padded present tokens. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // Searching for Phi3 sliding mask pattern to extend it to work with right-padded | |
| // past tokens and left-padded present tokens. | |
| // Search for the Phi3 sliding mask pattern to extend it to work with right-padded | |
| // past tokens and left-padded present tokens. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
| // past tokens = 3 | ||
| // present tokens = 5 (starting with current_pos_id = 3) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, how can we get 5 present tokens with 3 past tokens?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be done in speculative decoding case, for example
However, 3 and 5 case was described as general simple case for dynamic shapes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you introduced a new pass, why there are changes to the existing code? Have you also moved stuff around?
Since the changes are quite complex nowadays, I'd avoid the unnecessary diffs to focus only on the relevant ones.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will do!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
f7e27e9    to
    4fed7fb      
    Compare
  
    | 
               | 
          ||
| LOG_DEBUG("Try patch Phi-3 sliding window mask, if it exists."); | ||
| patch_phi3_sliding_mask(kvcache_model); | ||
| ov::save_model(kvcache_model, "swa_patched.xml"); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
6fadef6    to
    8b1a295      
    Compare
  
    c4bb342    to
    b142dc0      
    Compare
  
    
Details:
Phi3SlidingMaskpass that replaces formula :to
to create correct final sliding window mask for left-padded present tokens and right-padded past ones
Tickets: